$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

  • 2025-07-17 15:40:18
  • Junhong Min, Youngpil Jeon, Jimin Kim, Minyong Choi
  • 0

Abstract

The pursuit of a generalizable stereo matching model, capable of performingacross varying resolutions and disparity ranges without dataset-specificfine-tuning, has revealed a fundamental trade-off. Iterative local searchmethods achieve high scores on constrained benchmarks, but their core mechanisminherently limits the global consistency required for true generalization. Onthe other hand, global matching architectures, while theoretically more robust,have been historically rendered infeasible by prohibitive computational andmemory costs. We resolve this dilemma with $S^2M^2$: a global matchingarchitecture that achieves both state-of-the-art accuracy and high efficiencywithout relying on cost volume filtering or deep refinement stacks. Our designintegrates a multi-resolution transformer for robust long-range correspondence,trained with a novel loss function that concentrates probability on feasiblematches. This approach enables a more robust joint estimation of disparity,occlusion, and confidence. $S^2M^2$ establishes a new state of the art on theMiddlebury v3 and ETH3D benchmarks, significantly outperforming prior methodsacross most metrics while reconstructing high-quality details with competitiveefficiency.

 

Quick Read (beta)

loading the full paper ...